Lab 05 – Hypothesis testing

ENVX1002 Handbook

The University of Sydney
Published

Semester 1, 2025

Welcome

Learning outcomes
  • Learn to use R to calculate a 1-sample t-test
  • Apply the steps for hypothesis testing from lectures
  • Learn how to interpret statistical output

Before you begin

You can download the data

  1. From module 5 in Canvas
  2. ENVX1002_Data5.xlsx if you are viewing the html file from Github https://Github.com/envx-resources

Create a new project

Reminder (skip to step 2 if you are going to use the directory you created in your tutorial)

Step 1: Create a new project file for the practical put in your ENVX1002 Folder. File > New Project > New Directory > New Project.

Step 2: Download the data files from canvas or using above link and copy into your project directory.

I recommend that you make a data folder in your project directory to keep things tidy! If you make a data folder in your project directory you will need to indicate this path before the file name.

Step 3: Open a new Quarto file.

i.e. File > New File > Quarto Document and save it immediately i.e. File > Save.

Problems with your personal computer and R

NOTE: If you are having problems with R on your personal computer that cannot easily be solved by a demonstrator, please use the Lab PCs.

Installing packages

Remember All of the functions and data sets in R are organised into packages. There are the standard (or base) packages which are part of the source code - the functions and data sets that make up these packages are automatically available when R is opened. There are also many contributed packages. These have been written by many different authors, often to implement methods that are not available in the base packages. If you are unable to find a method in the base packages, you might be able to find it in a contributed package. The Comprehensive R Archive Network (CRAN) site (http://cran.r-project.org/) is where many contributed packages can be downloaded. Click on packages on the left hand side. We will download two packages in this class using the install.packages command and we then load the package into R using the library command.

Alternatively, in RStudio click on the Packages tab > Install > type in package name > click install.

Exercise 1: 1-sample t-test Milk Yield - Walk through

This exercise will walk you through how to test a hypothesis, check assumptions and eventually draw a conclusion on your initial hypothesis. 100 cows have their milk yield measured. Suppose we wish to test whether these milk yields (units unknown) differ significantly from the economic threshold of 11 units. (The units may possibly be litres of milk produced on a particular day).

Fact

The average Australian drinks about 100 litres of milk per year. The average cow produces between 12 and 30 litres of milk per day.

The data is in the Milk sheet found in the ENVX1002_Data5.xlsx file. You will follow the steps as outlined in the lectures:

  1. Choose level of significance (α)
  2. Write null and alternate hypotheses
  3. Check assumptions (normal)
  4. Calculate test statistic
  5. Obtain P-value or critical value
  6. Make statistical conclusion
  7. Write a scientific (biological) conclusion

Remember you can remember the above using HATPC

Lets go:

1. Normally you choose 0.05 as a level of significance:

This value is generally accepted in the scientific community and is also linked to type 2 errors where choosing a lower significance increases the likelihood of a type 2 error occurring.

2. Write null and alternative hypotheses:


Question: Write down the null hypothesis and alternative hypotheses:
H0: < Type your answer here >
H1: < Type your answer here >


Solution


Question: Write down the null hypothesis and alternative hypotheses:
H0: \(\mu_{yield}\) = 11 units
H1: \(\mu_{yield}\) \(\neq\) 11 units


3. Check assumptions (normality):

a. load data:

Make sure you set your working directory first

# Type your R code here

Solution

library(readxl)
milk <- read_excel("data/ENVX1002_Data5.xlsx", sheet = "Milk")

It is always good practice to look at the data first to make sure you have the correct data, it loaded in correctly and know what the names of the columns are. This can be done by typing the name of the data Milk or for large datasets, use str() to show the first 6 lines:

# Type your R code here

Solution

str(milk)
tibble [100 × 1] (S3: tbl_df/tbl/data.frame)
 $ Yield: num [1:100] 18.5 15.9 13.1 15.1 5.7 9.4 15.3 17.6 18.4 22 ...

b. Tests for normality:

qqplots:

# Type your R code here

Solution

#Load library
library(ggplot2)

ggplot(milk, aes(sample = Yield)) +
  stat_qq() +
  stat_qq_line()

Histogram and boxplots:

# Type your R code here

Solution

#Histogram
ggplot(milk, aes(x = Yield)) +
  geom_histogram(binwidth = 1, fill = "lightblue", color = "black") +
  labs(title = "Histogram of Milk Yield", x = "Yield", y = "Frequency")

#Boxplot
ggplot(milk, aes(x = Yield)) +
  geom_boxplot(fill = "lightblue", color = "black")


Question: Do the plots indicate the data are normally distributed?
Answer: < Type your answer here >


Solution


Question: Do the plots indicate the data are normally distributed?
Answer: yes - think about why?


Shapiro-Wilk test of normality:

# Type your R code here

Solution

shapiro.test(milk$Yield)

    Shapiro-Wilk normality test

data:  milk$Yield
W = 0.98967, p-value = 0.6379

Question: Does the Shapiro-Wilk test indicate the data are normally distributed? Explain your answer.
Answer: < Type your answer here >


Solution


Question: Does the Shapiro-Wilk test indicate the data are normally distributed? Explain your answer.
Answer: yes. p-value > 0.05.


4. Calculate the test statistic

In R we achieve this via the command t.test(milk$Yield, mu = …) The R output first gives us the calculated t value, the degrees of freedom, and then the p-value, it then provides the 95% CI and the mean of the sample. Were mu = … is written enter in the hypothesised mean.

# write your R code here

Solution

t.test(milk$Yield, mu = 11)

    One Sample t-test

data:  milk$Yield
t = 4.9291, df = 99, p-value = 3.323e-06
alternative hypothesis: true mean is not equal to 11
95 percent confidence interval:
 12.53485 14.60315
sample estimates:
mean of x 
   13.569 

5. Obtain P-value or critical value


Question: Does the hypothesised economic threshold lie within the confidence intervals?
Answer: < Type your answer here >


Solution


Question: Does the hypothesised economic threshold lie within the confidence intervals?
Answer: No


6. Make statistical conclusion


Question:: Based on the P-value, do we accept or reject the null hypothesis?
Answer: < Type your answer here >


Solution


Question:: Based on the P-value, do we accept or reject the null hypothesis?
Answer: Reject the null hypothesis


7. Write a scientific (biological) conclusion


Question:: Now write a scientific (biological) conclusion based on the outcome in 6.
Answer: < Type your answer here >


Solution


Question:: Now write a scientific (biological) conclusion based on the outcome in 6.
Answer: The milk yields differ significantly from the economic threshold of 11 units. In fact, the cows tested yield an average of 13.6 units (95% CI: 12.5, 14.6), which is significantly higher than the economic threshold of 11 units.


Exercise 2: Stinging trees (individual or in pairs)

Data file: Stinging.csv

A forest ecologist, studying regeneration of rainforest communities in gaps caused by large trees falling during storms, read that stinging tree, Dendrocnide excelsa, seedlings will grow 1.5m/year in direct sunlight such as gaps. In the gaps in her study plot, she identified 9 specimens of this species and measure them in 1998 and again 1 year later.

Does her data support the published contention that seedlings of this species will average 1.5m of growth per year in direct sunlight? Also, calculate a 95% CI for the true mean. Analyse the data in R. Due to the small sample size we have to assume the data is normal.

Fact

It was found that researchers wearing welding gloves and a full body suit were still stung by the tree. The sting is extremely painful and can last for months. The pain is caused by a neurotoxin that is injected into the skin. The tree is found in the rainforests of north-eastern Australia.

Work through the steps below individually or in pairs. Add more code chunks if required (click insert -> R on above toolbar)


  1. Choose level of significance (α)
    Answer:

Solution


  1. Choose level of significance (α)
    Answer: 0.05 is generally accepted in the scientific community.


  1. Write null and alternate hypotheses
    H0:
    H1:

Solution


  1. Write null and alternate hypotheses
    H0: \(\mu_{growth}\) = 1.5m/year
    H1: \(\mu_{growth}\) \(\neq\) 1.5m/year

  1. Check assumptions (normal)

Read in the data:

library(readxl)
sting <- read_excel("data/ENVX1002_Data5.xlsx", sheet = "Stinging")
sting
# A tibble: 9 × 1
  Stinging
     <dbl>
1      1.9
2      2.5
3      1.6
4      2  
5      1.5
6      2.7
7      1.9
8      1  
9      2  

Plot your data:

# Type your R code here

Solution

#qq plot
ggplot(sting, aes(sample = Stinging)) +
  stat_qq() +
  stat_qq_line()

#histogram
ggplot(sting, aes(x = Stinging)) +
  geom_histogram(binwidth = 1, fill = "lightgreen", color = "black") +
  labs(title = "Histogram of Stinging Tree Growth", x = "Growth (m)", y = "Frequency")

#Boxplot
ggplot(sting, aes(x = Stinging)) +
  geom_boxplot(fill = "lightgreen", color = "black") +
  labs(title = "Boxplot of Stinging Tree Growth", x = "Growth (m)", y = "Frequency")

Normality tests:

# Type your R code here

Solution

shapiro.test(sting$Stinging)

    Shapiro-Wilk normality test

data:  sting$Stinging
W = 0.96096, p-value = 0.8083

Question: Are data are normally distributed? Explain your answer.
Answer: < Type your answer here >


Solution


Question: Are data are normally distributed? Explain your answer.
Answer: Yes. Both the plots and Shapiro-Wilk test suggest the data is normal (p-value > 0.05).


  1. Calculate test statistic and
  2. Obtain P-value or critical value
# Type your R code here

Solution

t.test(sting$Stinging, mu = 1.5)

    One Sample t-test

data:  sting$Stinging
t = 2.3534, df = 8, p-value = 0.04643
alternative hypothesis: true mean is not equal to 1.5
95 percent confidence interval:
 1.508055 2.291945
sample estimates:
mean of x 
      1.9 

  1. Make statistical conclusion
    Answer:

Solution


  1. Make statistical conclusion
    Answer: P < 0.05 so we reject the null hypothesis \(\mu_{growth}\) = 1.5m/year


  1. Write a scientific (biological) conclusion
    Answer:

Solution


  1. Write a scientific (biological) conclusion
    Answer: The growth rate of the stinging tree, Dendrocnide excelsa is not equal to 1.5m/year. The mean growth rate is 1.9 m/year (95% CI: 1.51, 2.29), so the growth rate is faster than the previous study.

Check you answers with teaching staff

Thanks!

Bonus take home exercices

For each of these exercises, follow the steps outlined in the lectures (and this lab!) to test your hypotheses:

  1. Choose level of significance (α)
  2. Write null and alternate hypotheses
  3. Check assumptions (normal)
  4. Calculate test statistic
  5. Obtain P-value or critical value
  6. Make statistical conclusion
  7. Write a scientific (biological) conclusion

Exercise 1: Carrots

A farmer is growing carrots for a restaurant. The restaraunt wants their carrots to be 10 cm long, so the farmer wants to check if the carrots in their field differ significantly from the needed length.

#Read in data

carrots <- c(7, 7, 13, 5, 13, 10, 11, 12, 10,  9)

Solution


  1. Choose level of significance (α) > Answer: 0.05 is generally accepted in the scientific community.

  2. Write null and alternate hypotheses

H0: \(\mu_{carrot}\) = 10cm
H1: \(\mu_{carrot}\) \(\neq\) 10 cm

  1. Check assumptions (normal)
#boxplot
boxplot(carrots)

#histogram
hist(carrots)

#shapiro test

shapiro.test(carrots)

    Shapiro-Wilk normality test

data:  carrots
W = 0.93961, p-value = 0.5486

The data are normally distributed

  1. Calculate test statistic and
  2. Obtain P-value or critical value
#t test
t.test(carrots, mu = 10)

    One Sample t-test

data:  carrots
t = -0.35006, df = 9, p-value = 0.7343
alternative hypothesis: true mean is not equal to 10
95 percent confidence interval:
  7.761337 11.638663
sample estimates:
mean of x 
      9.7 
  1. Make statistical conclusion

p > 0.05, so we retain the. null hypothesis

  1. Write a scientific (biological) conclusion

The carrot length is not equal to 10 cm. The farmer’s carrots have a mean of 9.7 cm, so they are smaller than the needed length


Exercise 2: Penguins

Rey has just landed on earth and notived that penguins look really similar to porgs. Using weight as the point of comparison, she wants to know if two different penguin species weigh the same as her pet Porg Stevie, who weighs 4000g.

We will be using the Palmer penguin dataset to test if chinstrap and gentoo penguins weigh the same as Stevie.

#install.packages("palmerpenguins")
library(palmerpenguins)

2.1 Chinstrap

chinstrap <-  penguins%>%
  filter(species == "Chinstrap")%>%
  na.omit()

Solution


  1. Choose level of significance (α) > Answer: 0.05 is generally accepted in the scientific community.

  2. Write null and alternate hypotheses

H0: \(\mu_{chinstrap}\) = 4000g
H1: \(\mu_{chinstrap}\) \(\neq\) 4000g

  1. Check assumptions (normal)
#Load library
library(tidyverse)
#qqplot
ggplot(chinstrap, aes(sample = body_mass_g))+
  geom_qq()+
  geom_qq_line()

#boxplot
ggplot(chinstrap, aes(x = body_mass_g))+
  geom_boxplot()

#histogram
ggplot(chinstrap, aes(x = body_mass_g))+
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#shapiro test
shapiro.test(chinstrap$body_mass_g)

    Shapiro-Wilk normality test

data:  chinstrap$body_mass_g
W = 0.98449, p-value = 0.5605

The data are normally distributed

  1. Calculate test statistic and
  2. Obtain P-value or critical value
#t test
t.test(chinstrap$body_mass_g, mu = 4000)

    One Sample t-test

data:  chinstrap$body_mass_g
t = -5.7268, df = 67, p-value = 2.631e-07
alternative hypothesis: true mean is not equal to 4000
95 percent confidence interval:
 3640.059 3826.117
sample estimates:
mean of x 
 3733.088 
  1. Make statistical conclusion

p < 0.05, so we reject null hypothesis

  1. Write a scientific (biological) conclusion

Chinstrap penguins do not weigh the same as Stevie. On average, chinstrap penguins weigh 3733.088g, so they are lighter.


2.2 Gentoo

gentoo <-penguins%>%
  filter(species == "Gentoo")%>%
  na.omit() 

Solution


  1. Choose level of significance (α) > Answer: 0.05 is generally accepted in the scientific community.

  2. Write null and alternate hypotheses

H0: \(\mu_{gentoo}\) = 4000g
H1: \(\mu_{gentoo}\) \(\neq\) 4000g

  1. Check assumptions (normal)
#qqplot
ggplot(gentoo, aes(sample = body_mass_g))+
  geom_qq()+
  geom_qq_line()

#boxplot
ggplot(gentoo, aes(x = body_mass_g))+
  geom_boxplot()

#histogram
ggplot(gentoo, aes(x = body_mass_g))+
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#shapiro test
shapiro.test(gentoo$body_mass_g)

    Shapiro-Wilk normality test

data:  gentoo$body_mass_g
W = 0.98606, p-value = 0.2605

The data are normally distributed

  1. Calculate test statistic and
  2. Obtain P-value or critical value
#t test
t.test(gentoo$body_mass_g, mu = 4000)

    One Sample t-test

data:  gentoo$body_mass_g
t = 23.764, df = 118, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 4000
95 percent confidence interval:
 5001.403 5183.471
sample estimates:
mean of x 
 5092.437 
  1. Make statistical conclusion

p < 0.05, so we reject the null hypothesis

  1. Write a scientific (biological) conclusion

penguins do not weigh the same as Stevie.On average, gentoo penguins weigh 5092.437g, so they are heavier.


Attribution

This lab was developed using resources that are available under a Creative Commons Attribution 4.0 International license, made available on the SOLES Open Educational Resources repository.